Op-nare120594 1..11
نویسندگان
چکیده
Gene synthesis attempts to assemble user-defined DNA sequences with base-level precision. Verifying the sequences of construction intermediates and the final product of a gene synthesis project is a critical part of the workflow, yet one that has received the least attention. Sequence validation is equally important for other kinds of curated clone collections. Ensuring that the physical sequence of a clone matches its published sequence is a common quality control step performed at least once over the course of a research project. GenoREAD is a web-based application that breaks the sequence verification process into two steps: the assembly of sequencing reads and the alignment of the resulting contig with a reference sequence. GenoREAD can determine if a clone matches its reference sequence. Its sophisticated reporting features help identify and troubleshoot problems that arise during the sequence verification process. GenoREAD has been experimentally validated on thousands of gene-sized constructs from an ORFeome project, and on longer sequences including whole plasmids and synthetic chromosomes. Comparing GenoREAD results with those from manual analysis of the sequencing data demonstrates that GenoREAD tends to be conservative in its diagnostic. GenoREAD is available at www.genoread.org. INTRODUCTION Gene synthesis (1,2) is the process of manufacturing user-defined DNA sequences with base-level precision. The limitations of the chemistries used at different steps of the process require scientists to verify the physical sequence of the clones they produce at the different stages of the assembly process. The rapid development and commercial success of new high-throughput sequencing technologies calls for a careful analysis of the technology best suited to meet the sequence verification needs of gene synthesis operators. Difference of throughput, price structure and access to sequencing resources should be considered in relation to the gene synthesis facility throughput, nature of the sequences it produces and other technical and economic constraints. Since the verification of thousands of 1-kb building blocks is very different from the verification of a small number of 100-kb synthetic fragments, different sequencing technologies are used at different stages of synthetic genomics projects (3). In this fast-evolving landscape of sequencing technologies, Sanger sequencing still remains the most commonly used technology for sequence verification (4,5). While more expensive per base than newer sequencing technologies, Sanger is less expensive per run, making it more relevant to the job of clone-verification than it might be for a traditional genome-sized sequence verification project. Sanger remains the most cost-effective sequencing technology for most gene synthesis projects focused on assembling sequences that do not exceed a few kilobases in length. The need to verify the sequence of clones and plasmids is not limited to gene synthesis; it also applies to any plasmid containing inserts with known sequences, *To whom correspondence should be addressed. Tel: +1 540 231 0403; Fax: +1 540 231 2606; Email: [email protected] The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors. Present addresses: Yizhi Cai, Johns Hopkins University School of Medicine, High Throughput Biology Center, Baltimore, MD 21205, USA. João C. Setubal, Department of Biochemistry, University of São Paulo, São Paulo, SP 05508-000, Brazil. Published online 4 October 2012 Nucleic Acids Research, 2013, Vol. 41, No. 1 e25 doi:10.1093/nar/gks908 The Author(s) 2012. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted, distribution, and reproduction in any medium, provided the original work is properly cited. at O xord Jornals on Feruary 0, 2013 http://narrdjournals.org/ D ow nladed from such as clones from ORFeome collections, irrespective of the way the plasmid was assembled. It is common practice in molecular biology to verify the sequence of a plasmid prior to publication or submission to a community resource like Addgene (6), the Registry of Standard Biological Parts (7) or the DNASU repository (8). The value of integrating sequencing data in database applications to manage large collections of biological parts is now well recognized. Since efforts to verify the collection of clones distributed to the iGEM students demonstrated the need for systematic quality control of submissions to the Registry of Standard Biological Parts (7), users of this community resource have been encouraged to upload sequencing trace files. Each read is aligned with the part’s reference sequence, and the verification status of the clone is clearly displayed for each physical distribution in the Registry. Addgene provides a similar functionality to its users by sharing results of its own internal quality controls beside the sequence provided by the depositing scientist; however, the website does not provide users with tools to easily compare the two sequences. Sequence verification is therefore a critical part of the workflow of many projects in the life sciences, yet it is probably the one that has received the least attention. In GenoREAD, this process is composed of two successive steps. The physical sequence of a clone is first determined by analysing a number of sequencing traces. In a second step, the clone’s physical sequence is compared with the clone’s expected sequence, also called the reference sequence. Commercially available bioinformatics packages (VectorNTI, CLC Bio Workbench, Lasergene, and others) include algorithms necessary to perform this analysis. They can analyse trace files produced by sequencing instruments and align the output of the sequencing analysis with the reference sequence. They are often capable of batch processing large numbers of files in a single operation. However, none of these packages has a sequence verification feature. Even though sequence verification is a common problem, no commercial package provides a turnkey solution to this problem. The situation is similar with open-source packages like EMBOSS (9) or UGENE (10). The lack of a sequence verification pipeline is more than just a convenience issue. Sequencing data are often manually compared to the plasmid reference sequence; this approach can be very time-consuming and prone to human error due to operator fatigue and a lack of rigorous analysis processes. Furthermore, the outcome of the sequence verification process is dependent on the algorithms selected at each step of the process and the parameters used when calling these algorithms. There is a real possibility for mistakes to be made during the sequence verification process when performing a manual analysis. These mistakes can lead to accepting a clone with undetected mutations, or they can lead to the rejection of perfectly acceptable clones that produced less than optimal sequencing data. Since the purpose of the sequence verification step is to rule out discrepancies between a clone’s physical sequence and its expected sequence, it is critical to ensure that this step does not introduce new errors itself. This can be achieved by developing automated and validated sequence verification pipelines that can quickly and predictably analyse large collections of sequencing data with minimal user input. The Joint BioEnergy Institute Inventory of Composable Elements (JBEI-ICE) is an open-source software platform for managing collections of biological parts (11); it includes a feature called SequenceChecker that visually aligns sequencing data with the plasmid’s reference sequence with the goal of detecting discrepancies. SequenceChecker does not resolve conflicting reads nor does it determine the sequence verification status of the clone. CloneQC is a web-based application (12) developed to automate the sequence verification of the large number of clones generated by the Synthetic Yeast 2.0 project (13,14). CloneQC allows users to upload two archives containing the trace files and the reference sequences. The sequencing reads are automatically matched with the corresponding reference sequence using BLAST (15). The forward and reverse reads are then more precisely aligned with the reference sequence using ClustalW (16). CloneQC then takes into consideration the alignment results along with the quality of the read to assign one of several quality statuses to the clone (Pass, Fail, Check, Fixable). CloneQC was the first tool to propose a rigorous algorithm to the verification of clones generated in the context of a large scale DNA synthesis operation. Its major limitation is that it cannot handle the verification of clones longer than the span of two Sanger sequencing reads, or about 2000 bp. In this article, we describe GenoREAD, a new sequence verification application that breaks down the analysis process into two distinct steps: the assembly of the sequencing reads into a contig, and the alignment of the contig with the reference sequence. This approach allows GenoREAD to verify the sequence of short and long genetic constructs. The application workflow has been used on thousands of gene-sized constructs, as well as longer sequences, such as the complete sequences of plasmids and a 96-kb synthetic chromosome. GenoREAD provides sophisticated reporting capabilities that can help users uncover various sequencing verification problems. GenoREAD reports have been validated by comparing them to the results of a manual sequence verification process relying on desktop applications. This pipeline has been made available to the scientific community at www.genoread.org with the hope that it may facilitate the systematic verification of synthetic genetic constructs produced by gene synthesis and other cloning techniques. MATERIALS AND METHODS
منابع مشابه
Lab Assignments
Lab Assignments Instrumentation Labs: Name of Lab Assignment: Lab 1 Introductory Experiments and Linear Circuits I [1] Lab 2 Linear Circuits II [2] Lab 3 Semiconductor Diodes [3] Lab 4 JFET Circuits I [4] Lab 5 JFET Circuits II [5] Lab 6 Op Amps I [6] Lab 7 Op Amps II [7] Lab 8 Op Amps III [8] Lab 9 LabVIEW Programming [9] Lab 10 Analog to Digital and Digital to Analog Conversion [10] Lab 11 Si...
متن کاملLab Assignments
Lab Assignments Instrumentation Labs: Name of Lab Assignment: Lab 1 Introductory Experiments and Linear Circuits I [1] Lab 2 Linear Circuits II [2] Lab 3 Semiconductor Diodes [3] Lab 4 JFET Circuits I [4] Lab 5 JFET Circuits II [5] Lab 6 Op Amps I [6] Lab 7 Op Amps II [7] Lab 8 Op Amps III [8] Lab 9 LabVIEW Programming [9] Lab 10 Analog to Digital and Digital to Analog Conversion [10] Lab 11 Si...
متن کاملTotal absence of colony-stimulating factor 1 in the macrophage-deficient osteopetrotic (op/op) mouse.
Osteopetrotic (op/op) mutant mice suffer from congenital osteopetrosis due to a severe deficiency of osteoclasts. Furthermore, the total number of mononuclear phagocytes is extremely low in affected mice. Serum, 11 tissues, and different cell and organ conditioned media from op/op mice were shown to be devoid of biologically active colony-stimulating factor 1 (CSF-1), whereas all of these prepa...
متن کاملExamples discussion 11
[11.1] For T : V → V a continuous (=bounded) linear map of a Banach space V to itself, show that the operator norm is an upper bound for absolute values of all eigenvalues λ: |λ|C ≤ |T |op. Further, show that |T |op is an upper bound for all of the spectrum, that is, T − λ is invertible for |λ|C > |T |op. Discussion: First, for Tv = λ · v for 0 6= v ∈ V , without loss of generality take |v| = 1...
متن کاملThe Effects of Smoking on the Osmotic Pressure of Human Dental Pulp Tissue
OBJECTIVE We aimed to investigate the effect of smoking on the osmotic pressure (OP) of human dental pulp tissue. MATERIALS AND METHODS Sixty male dental patients (smokers and nonsmokers) scheduled for root canal treatment for prosthodontics were included in the study. Fifteen patients (1 premolar tooth/patient) were allocated to each of the following groups according to their smoking habits,...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012